智能论文笔记

Swin MAE: Masked Autoencoders for Small Datasets

Zi'an Xu , Yin Dai , Fayu Liu , Weibing Chen , Yue Liu , Lifu Shi , Sheng Liu , Yuhang Zhou

分类：计算机视觉 | 人工智能

2022-12-28

The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most of the current unsupervised learning methods need to be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, which is a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images and without using any pre-trained models, Swin MAE is still able to learn useful semantic features purely from images. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in terms of the transfer learning results of downstream tasks. The code will be publicly available soon.

translated by 谷歌翻译

Avoiding spurious correlations via logit correction

Sheng Liu , Xu Zhang , Nitesh Sekhar , Yue Wu , Prateek Singhal , Carlos Fernandez-Granda

分类：机器学习 | 自然语言处理 | 计算机视觉 | (统计)机器学习

2022-12-02

Empirical studies suggest that machine learning models trained with empirical risk minimization (ERM) often rely on attributes that may be spuriously correlated with the class labels. Such models typically lead to poor performance during inference for data lacking such correlations. In this work, we explicitly consider a situation where potential spurious correlations are present in the majority of training data. In contrast with existing approaches, which use the ERM model outputs to detect the samples without spurious correlations, and either heuristically upweighting or upsampling those samples; we propose the logit correction (LC) loss, a simple yet effective improvement on the softmax cross-entropy loss, to correct the sample logit. We demonstrate that minimizing the LC loss is equivalent to maximizing the group-balanced accuracy, so the proposed LC could mitigate the negative impacts of spurious correlations. Our extensive experimental results further reveal that the proposed LC loss outperforms the SoTA solutions on multiple popular benchmarks by a large margin, an average 5.5% absolute improvement, without access to spurious attribute labels. LC is also competitive with oracle methods that make use of the attribute labels. Code is available at https://github.com/shengliu66/LC.

translated by 谷歌翻译

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Yue Guo , Wei Qiu , Gondy Leroy , Sheng Wang , Trevor Cohen

分类：自然语言处理

2022-11-07

Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract and the corresponding lay language summary are written by domain experts, assuring the quality of our dataset. Furthermore, qualitative evaluation of expert-authored plain language summaries has revealed background explanation as a key strategy to increase accessibility. Such explanation is challenging for neural models to generate because it goes beyond simplification by adding content absent from the source. We derive two specialized paired corpora from CELLS to address key challenges in lay language generation: generating background explanations and simplifying the original abstract. We adopt retrieval-augmented models as an intuitive fit for the task of background explanation generation, and show improvements in summary quality and simplicity while maintaining factual correctness. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. CELLS is publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval.

translated by 谷歌翻译

Sustainable AI Processing at the Edge

Sébastien Ollivier , Sheng Li , Yue Tang , Chayanika Chaudhuri , Peipei Zhou , Xulong Tang , Jingtong Hu , Alex K. Jones

分类：人工智能

2022-07-04

边缘计算是加速机器学习算法支持移动设备的流行目标，而无需通信潜伏在云中处理它们。机器学习的边缘部署主要考虑传统问题，例如其安装的交换约束（尺寸，重量和功率）。但是，考虑到体现能量和碳的重要贡献，这种指标不足以考虑计算的环境影响。在本文中，我们探讨了用于推理和在线培训的卷积神经网络加速引擎的权衡。特别是，我们探讨了内存处理（PIM）方法，移动GPU加速器以及最近发布的FPGA的使用，并将它们与新颖的赛车记忆PIM进行比较。用赛车记忆PIM替换支持PIM的DDR3可以恢复其体现的能量，以至于1年。对于高活动比，与支持PIM的赛车记忆相比，移动GPU可以更可持续，但具有更高的体现能量可以克服。

translated by 谷歌翻译

PolyU-BPCoMa: A Dataset and Benchmark Towards Mobile Colorized Mapping Using a Backpack Multisensorial System

Wenzhong Shi , Pengxin Chen , Muyang Wang , Sheng Bao , Haodong Xiang , Yue Yu , Daping Yang

分类：计算机视觉

2022-06-15

通过移动激光扫描和图像构建有色点的云是测量和映射的基本工作。它也是为智能城市建造数字双胞胎的重要先决条件。但是，现有的公共数据集要么是相对较小的规模，要么缺乏准确的几何和彩色地面真理。本文记录了一个名为Polyu-BPComa的多功能数据集，该数据集可独特地定位于移动着色映射。该数据集在背包平台上包含3D激光雷达，球形成像，GNSS和IMU的资源。颜色检查器板在每个调查区域粘贴，因为目标和地面真相数据是由先进的陆地激光扫描仪（TLS）收集的。 3D几何信息和颜色信息可以分别在背包系统和TLS产生的有色点云中恢复。因此，我们提供了一个机会，可以同时为移动多感官系统对映射和着色精度进行基准测试。该数据集的尺寸约为800 GB，涵盖室内和室外环境。数据集和开发套件可在https://github.com/chenpengxin/polyu-bpcoma.git上找到。

translated by 谷歌翻译

Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction

Yue Cao , XiaoJiang Zhou , Jiaqi Feng , Peihao Huang , Yao Xiao , Dayao Chen , Sheng Chen

分类：人工智能

2022-05-20

事实证明，丰富的用户行为数据对于点击率（CTR）预测应用程序具有很高的价值，尤其是在工业推荐，搜索或广告系统中。但是，由于在线服务时间的严格要求，现实世界系统不仅可以充分利用长期用户行为。大多数以前的作品都采用基于检索的策略，在此策略中，首先检索了少数用户行为以进行后续注意。但是，基于检索的方法是最佳的，会造成或多或少的信息损失，并且很难平衡检索算法的有效性和效率。在本文中，我们提出了SDIM（基于采样的深度兴趣建模），这是一种简单但有效的基于采样的端到端方法，用于建模长期用户行为。我们从多个哈希功能中采样，以生成候选项目和用户行为序列中的每个项目的哈希签名，并通过直接收集与具有相同哈希签名的候选项目相关的行为项来获得用户兴趣。我们在理论上和实验上表明，所提出的方法在基于标准的基于注意力的模型上对长期用户行为进行建模，同时更快。我们还介绍了系统中SDIM的部署。具体而言，我们通过设计一个名为BSE（行为序列编码）的单独模块（行为序列编码），将行为序列哈希（这是最耗时的部分）解脱出最耗时的部分。 BSE对于CTR服务器是无延迟的，使我们能够建模极长的用户行为。进行离线和在线实验，以证明SDIM的有效性。 SDIM现在已在线部署在Meituan应用程序的搜索系统中。

translated by 谷歌翻译

Adversarial Memory Networks for Action Prediction

Zhiqiang Tao , Yue Bai , Handong Zhao , Sheng Li , Yu Kong , Yun Fu

分类：计算机视觉

2021-12-18

行动预测旨在通过部分观察视频推断即将举行的人类行动，这是由于早期观察结果有限的信息有限。现有方法主要采用重建策略来处理此任务，期望从部分观察到完整视频来学习单个映射函数，以便于预测过程。在这项研究中，我们提出了来自两个新方面的部分视频查询生成“完整视频”功能调节的对抗性记忆网络（AMEMNet）。首先，键值结构化存储器发生器旨在将不同的部分视频存储为键存储器，并在具有门控机制和查询关注的值存储器中动态地写入完整视频。其次，我们开发了一个类感知判别者，以指导内存发生器在对抗训练时不仅提供现实，而且还提供鉴别的完整视频特征。通过RGB和光学流量的晚期融合给出了AMEMNET的最终预测结果。提供两个基准视频数据集，UCF-101和HMDB51的广泛实验结果，以证明所提出的AMEMNET模型在最先进的方法的有效性。

translated by 谷歌翻译

ForgeryNet -- Face Forgery Analysis Challenge 2021: Methods and Results

Yinan He , Lu Sheng , Jing Shao , Ziwei Liu , Zhaofan Zou , Zhizhi Guo , Shan Jiang , Curitis Sun , Guosheng Zhang , Keyao Wang

分类：计算机视觉

2021-12-15

光保护综合技术的快速进展达到了真实和操纵图像之间的边界开始模糊的临界点。最近，一个由Mega-Scale Deep Face Forgery DataSet，由290万个图像组成和221,247个视频的伪造网络已被释放。它是迄今为止的数据规模，操纵（7个图像级别方法，8个视频级别方法），扰动（36个独立和更混合的扰动）和注释（630万个分类标签，290万操纵区域注释和221,247个时间伪造段标签）。本文报告了Forgerynet-Face Forgery Analysis挑战2021的方法和结果，它采用了伪造的基准。模型评估在私人测试集上执行离线。共有186名参加比赛的参与者，11名队伍提交了有效的提交。我们将分析排名排名的解决方案，并展示一些关于未来工作方向的讨论。

translated by 谷歌翻译

Implicit Transformer Network for Screen Content Image Continuous Super-Resolution

Jingyu Yang , Sheng Shen , Huanjing Yue , Kun Li

分类：计算机视觉

2021-12-12

如今，由于屏幕共享，远程合作和在线教育的广泛应用，屏幕内容存在爆炸性增长。为了匹配有限终端带宽，可以缩小高分辨率（HR）屏幕内容并压缩。在接收器侧，低分辨率（LR）屏幕内容图像（SCI）的超分辨率（SR）由HR显示器或用户缩小以供详细观察。然而，由于图像特性非常不同的图像特性以及在任意尺度下浏览的SCI浏览要求，图像SR方法主要针对自然图像设计不概括SCI。为此，我们为SCISR提出了一种新颖的隐式变压器超分辨率网络（ITSRN）。对于任意比率的高质量连续SR，通过所提出的隐式变压器从密钥坐标处的图像特征推断出查询坐标处的像素值，并且提出了隐式位置编码方案来聚合与查询相似的相邻像素值。使用LR和HR SCI对构建基准SCI1K和SCI1K压缩数据集。广泛的实验表明，提出的ITSRN显着优于压缩和未压缩的SCI的几种竞争连续和离散SR方法。

translated by 谷歌翻译

A Latent Encoder Coupled Generative Adversarial Network (LE-GAN) for Efficient Hyperspectral Image Super-resolution

Yue Shi , Liangxiu Han , Lianghao Han , Sheng Chang , Tongle Hu , Darren Dancey

分类：计算机视觉

2021-11-16

现实的高光谱图像（HSI）超分辨率（SR）技术旨在从其低分辨率（LR）对应物中产生具有更高光谱和空间忠诚的高分辨率（HR）HSI。生成的对抗网络（GAN）已被证明是图像超分辨率的有效深入学习框架。然而，现有GaN的模型的优化过程经常存在模式崩溃问题，导致光谱间不变重建容量有限。这可能导致所生成的HSI上的光谱空间失真，尤其是具有大的升级因子。为了缓解模式崩溃的问题，这项工作提出了一种与潜在编码器（Le-GaN）耦合的新型GaN模型，其可以将产生的光谱空间特征从图像空间映射到潜在空间并产生耦合组件正规化生成的样本。基本上，我们将HSI视为嵌入在潜在空间中的高维歧管。因此，GaN模型的优化被转换为学习潜在空间中的高分辨率HSI样本的分布的问题，使得产生的超分辨率HSI的分布更接近其原始高分辨率对应物的那些。我们对超级分辨率的模型性能进行了实验评估及其在缓解模式崩溃中的能力。基于具有不同传感器（即Aviris和UHD-185）的两种实际HSI数据集进行了测试和验证，用于各种升高因素并增加噪声水平，并与最先进的超分辨率模型相比（即Hyconet，LTTR，Bagan，SR-GaN，Wgan）。

translated by 谷歌翻译